Goto

Collaborating Authors

 underdamped langevin dynamic






Accelerating Langevin Monte Carlo Sampling: A Large Deviations Analysis

Yao, Nian, Ali, Pervez, Tao, Xihua, Zhu, Lingjiong

arXiv.org Machine Learning

Langevin algorithms are popular Markov chain Monte Carlo methods that are often used to solve high-dimensional large-scale sampling problems in machine learning. The most classical Langevin Monte Carlo algorithm is based on the overdamped Langevin dynamics. There are many variants of Langevin dynamics that often show superior performance in practice. In this paper, we provide a unified approach to study the acceleration of the variants of the overdamped Langevin dynamics through the lens of large deviations theory. Numerical experiments using both synthetic and real data are provided to illustrate the efficiency of these variants.


Fisher information dissipation for time inhomogeneous stochastic differential equations

Feng, Qi, Zuo, Xinzhe, Li, Wuchen

arXiv.org Artificial Intelligence

We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distribution. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of the SDE. We verify the proposed conditions for several time-inhomogeneous Langevin dynamics. For the overdamped Langevin dynamics, we prove the $O(t^{-1/2})$ convergence in $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped Langevin dynamics. Numerical examples demonstrate the convergence results for the time-dependent Langevin dynamics.


Mean-field Underdamped Langevin Dynamics and its Space-Time Discretization

Fu, Qiang, Wilson, Ashia

arXiv.org Machine Learning

A popular approach for finding a minimizer of an EMO problem is based on the mean-field Langevin dynamics. When the problem is convex, Hu et al. (2019) show the mean-field Langevin dynamics convergences to the minimizer asymptotically; and when the problem satisfies a uniform logarithmic Sobolev inequality, several works have established an exponentially fast convergence (Chizat, 2022; Nitanda et al., 2022; Chen et al., 2022). The mean-field Langevin dynamics, however, cannot be implemented and bridging this gap requires both space and time discretizations of the dynamics. A time-discretization of the mean-field Langevin dynamics was analyzed by Nitanda et al. (2022) who extend the interpolation argument introduced by Vempala and Wibisono (2019) to a non-linear Fokker-Planck equation. They establish the non-asymptotic convergence of the time-discretized dynamics in the energy gap. A space-discretization was studied by Chen et al. (2022) who show that the finite-particle approximation to the density of the mean-field Langevin dynamics (referred to as a uniform-in-time propagation of chaos) converges exponentially fast, with a bias related to the number of particles. More practically, Suzuki et al. (2023) analyze a space-time discretization of the mean-field Langevin dynamics and establish the non-asymptotic convergence 1


Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

Wang, Ziyi, Chen, Yujie, Song, Qifan, Zhang, Ruqi

arXiv.org Machine Learning

Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.


Unbiased Estimation using Underdamped Langevin Dynamics

Ruzayqat, Hamza, Chada, Neil K., Jasra, Ajay

arXiv.org Machine Learning

In this work we consider the unbiased estimation of expectations w.r.t.~probability measures that have non-negative Lebesgue density, and which are known point-wise up-to a normalizing constant. We focus upon developing an unbiased method via the underdamped Langevin dynamics, which has proven to be popular of late due to applications in statistics and machine learning. Specifically in continuous-time, the dynamics can be constructed {so that as the time goes to infinity they} admit the probability of interest as a stationary measure. {In many cases, time-discretized versions of the underdamped Langevin dynamics are used in practice which are run only with a fixed number of iterations.} We develop a novel scheme based upon doubly randomized estimation as in \cite{ub_grad,disc_model}, which requires access only to time-discretized versions of the dynamics. {The proposed scheme aims to remove the dicretization bias and the bias resulting from running the dynamics for a finite number of iterations}. We prove, under standard assumptions, that our estimator is of finite variance and either has finite expected cost, or has finite cost with a high probability. To illustrate our theoretical findings we provide numerical experiments which verify our theory, which include challenging examples from Bayesian statistics and statistical physics.


Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic differential equations

Sanz-Serna, J. M., Zygalakis, Konstantinos C.

arXiv.org Machine Learning

We present a framework that allows for the non-asymptotic study of the $2$-Wasserstein distance between the invariant distribution of an ergodic stochastic differential equation and the distribution of its numerical approximation in the strongly log-concave case. This allows us to study in a unified way a number of different integrators proposed in the literature for the overdamped and underdamped Langevin dynamics. In addition, we analyse a novel splitting method for the underdamped Langevin dynamics which only requires one gradient evaluation per time step. Under an additional smoothness assumption on a $d$--dimensional strongly log-concave distribution with condition number $\kappa$, the algorithm is shown to produce with an $\mathcal{O}\big(\kappa^{5/4} d^{1/4}\epsilon^{-1/2} \big)$ complexity samples from a distribution that, in Wasserstein distance, is at most $\epsilon>0$ away from the target distribution.